Short-term vital parameter forecasting in the intensive care unit: A benchmark study leveraging data from patients after cardiothoracic surgery

Patients in an Intensive Care Unit (ICU) are closely and continuously monitored, and many machine learning (ML) solutions have been proposed to predict specific outcomes like death, bleeding, or organ failure. Forecasting of vital parameters is a more general approach to ML-based patient monitoring, but the literature on its feasibility and robust benchmarks of achievable accuracy are scarce. We implemented five univariate statistical models (the naïve model, the Theta method, exponential smoothing, the autoregressive integrated moving average model, and an autoregressive single-layer neural network), two univariate neural networks (N-BEATS and N-HiTS), and two multivariate neural networks designed for sequential data (a recurrent neural network with gated recurrent unit, GRU, and a Transformer network) to produce forecasts for six vital parameters recorded at five-minute intervals during intensive care monitoring. Vital parameters were the diastolic, systolic, and mean arterial blood pressure, central venous pressure, peripheral oxygen saturation (measured by non-invasive pulse oximetry) and heart rate, and forecasts were made for 5 through 120 minutes into the future. Patients used in this study recovered from cardiothoracic surgery in an ICU. The patient cohort used for model development (n = 22,348) and internal testing (n = 2,483) originated from a heart center in Germany, while a patient sub-set from the eICU collaborative research database, an American multicenter ICU cohort, was used for external testing (n = 7,477). The GRU was the predominant method in this study. Uni- and multivariate neural network models proved to be superior to univariate statistical models across vital parameters and forecast horizons, and their advantage steadily became more pronounced for increasing forecast horizons. With this study, we established an extensive set of benchmarks for forecast performance in the ICU. Our findings suggest that supplying physicians with short-term forecasts of vital parameters in the ICU is feasible, and that multivariate neural networks are most suited for the task due to their ability to learn patterns across thousands of patients.


Introduction
Postoperative care in an intensive care unit (ICU) is essential for patients undergoing cardiothoracic surgery [1].Due to the strain on patients' cardiovascular system caused by the procedure and the underlying health condition, complications can arise which require extensive pharmacological support [2], re-admission to surgery, or organ replacement therapy [3].Therefore, continuous monitoring of patients' vital parameters is crucial to allow timely and well-targeted interventions.
The high density of structured patient data makes the ICU a prime use case for the implementation of machine learning (ML) solutions, and the feasibility of predicting events like intra-hospital mortality [3][4][5], length of ICU stay [4], post-operative bleeding [3], renal failure [3], delirium [6] or threshold alarms [7] is well-documented (see [8][9][10] for reviews).However, due to the multitude of potential complications, a limited number of prediction models for specific adverse events might not be able to capture all threats to the patient's recovery, while further adding to the number of parameters in need of evaluation by the medical staff.
Forecasting vital parameter progression curves instead of specific events offers a more generalizable approach to ML-assisted monitoring in the ICU.Displaying forecasts for upcoming vital parameter values could benefit physicians and patients through early indication of improvements or deteriorations, and could aid with risk-stratification, adjustment of medication, or discharge decisions based solely on parameters which are already constantly and routinely monitored.
contacting the Clinical Data Science working group of Deutsches Herzzentrum der Charite ´(officemeyer@dhzc-charite.de).Access to the eICU collaborative research database ( Despite its potential to enhance patient monitoring, there is currently a lack of research on forecasting vital parameters in the ICU.Available studies are limited in terms of sample size, or external validation, and report only on a limited set of vital parameters and forecast horizons [11,12].Moreover, to the best of our knowledge, no studies comparing multiple forecasting methodologies of differing complexity to identify the most suitable approaches in the ICU have been published to date.
While the dominance of modern ML techniques over conventional statistical models has been well established for applications such as image recognition or natural language processing, ML has only in recent years made it into the methodological mainstream of time series forecasting [13].While recent results, e.g. in forecasting competitions, show that well-designed ML models can be employed highly effectively for forecasting, far simpler and computationally cheaper methods like exponential smoothing remain competitive [13,14].For the task of vital parameter forecasting in the ICU, it therefore remains to be established whether an increase in model complexity yields better forecasting performance.
With this work, we intend to establish robust benchmarks of forecast accuracy in postoperative cardiothoracic intensive care.For six essential and routinely captured vital parameters recorded at five-minute intervals, we report and compare forecast performance 5 to 120 minutes into the future, using forecasting models differing in complexity from univariate statistical methods to multivariate neural networks designed for sequential data.We hypothesize that high-quality forecasting of vital parameters in the intensive care unit is feasible, and that the forecast quality of modern neural network techniques is superior to that of univariate statistical methods due to their ability to perform cross-patient learning and pattern detection using data from thousands of ICU stays at once.

Study populations
We used two data sources in this study: an internal dataset used for model development, hyperparameter optimization and internal testing, and an external dataset exclusively used for model testing.
Model development, hyperparameter optimization and internal testing were performed on an anonymized single-center dataset from a German quaternary cardiovascular center.It includes vital parameters recorded at one-minute intervals during ICU stays of patients following cardiothoracic surgery between October 2012 and August 2022.
For external testing of our proposed methods, we used a sub-set of patients from the eICU Collaborative Research Database, a multi-center database containing intensive care data at five-minute intervals from the United States collected in 2014 and 2015 [15].
Vital parameters of interest were systolic, diastolic, and mean blood pressure, central venous pressure, heart rate and peripheral oxygen saturation (measured via non-invasive pulse oximetry).Other data sources like laboratory values or medication were not part of this study.
For compatibility of the two data sources, we down-sampled vital parameters in the internal single-center cohort to five-minute intervals using the median.To avoid data leakage during down-sampling, we kept the first timepoint (t 0 ) and value untouched, and defined values at later timepoints (t 0 + 5 minutes, t 0 + 10 minutes . ..) as the median of the current and four preceding observations.Inclusion criteria were identical for both data sources.We included only adult postoperative patients aged 18 or older, admitted to a cardiothoracic surgery ICU.We required at least one hour of recorded ICU data per patient, and included only the first 24 hours of their first ever recorded ICU admission.Re-admissions were excluded from the analysis.Thus, each ICU admission corresponds to one unique patient and vice versa.
On the internal single-center cohort, we performed a random 8:1:1 split across patients into a training set, a hyperparameter validation set, and an internal test set.
We obtained permission from the Charite ´-Universita ¨tsmedizin Berlin ethics committee to retrospectively analyze the single-center data (approval number EA1/202/23) and obtained access to the eICU database.

Data cleaning
Using pre-defined bounds of plausibility for each vital parameter (Table 1), we removed values exceeding these limits and considered them missing.In the internal dataset, this was done prior to down-sampling.The start of each patient's multivariate vital parameter time-series was defined as the first timepoint with at least one valid non-missing vital parameter.Per vital parameter, we imputed missing values as follows: we used forward-filling for at most three consecutive timepoints where a previous value of the same patient was known.If more than three consecutive timepoints were missing or no previous value was known, we imputed using non-missing values of the other vital parameters through scikit-learn's iterative imputing strategy with linear regression models to predict missing from available vital parameters at a given timepoint.[16]

Forecast methodology
We implemented nine different forecasting models from three model classes (Table 2): five univariate statistical models, two univariate neural networks (N-BEATS [17] and N-HiTS [18]), and two multivariate neural networks, specifically a recurrent neural network with gated recurrent unit (GRU) [19] and a Transformer architecture specifically designed for time-series tasks [20].
Fig 1 schematically depicts a synthetic example of our forecasting setup for three successive forecasts produced by one method for a single vital parameter.Every five minutes, models produced forecasts for 5, 10, 15, . .., 120 minutes into the future.Forecasts were always based on the entire observed past up to the current time point.Thus, the length of the forecast window remained fixed at 120 minutes, but the amount of past values on which predictions were based depended on the time point of prediction.
Univariate statistical models were newly trained from scratch at every time point per vital parameter and patient, which allowed models to change with each new observation.In contrast, for uni-and multivariate neural networks, we trained single, immutable models on all patients observed in the training set.For test set predictions, at each time point per patient, these immutable models were presented with the vital parameters up to the point in question, from which forecasts were computed.Per patient and vital parameter, we produced forecasts every five minutes starting with the sixth, and ending with the penultimate timepoint.Thus, univariate statistical models were given at least five past observations to train on, and all forecasts could be evaluated on at least one true value.
As the only manual adjustment of forecasts produced by our models, forecasts exceeding the pre-defined bounds of plausibility (Table 1) were capped at the corresponding limit.

Univariate statistical models
Due to the non-seasonal and non-cyclic nature of the studied vital parameters at five-minute intervals, we restricted the choice of univariate statistical models to models without explicit components for these types of patterns.We implemented five univariate statistical forecasting models: the naïve method (constant forecast using the most recent observed value, hereafter referred to as Naive), the Theta method [21], exponential smoothing (ETS) [22], the autoregressive integrated moving average (ARIMA) model [23], and an autoregressive single-layer neural network (AR NNet) [23].All five methods were implemented using R's forecast package [24].Model specifications were iteratively tested on the validation set to optimize performance in terms of the root mean squared error, but also to avoid any numerical errors during model fitting or at prediction time.
If at least 24 past observations were available for a given vital parameter, we used Box-Coxtransformation [25] on the time series with a transformation parameter between zero and two, using the implementation in the forecast package [24].Predictions were inverse-transformed to retain the original scale.
For the ETS model, we defined the following potential model specifications and let the forecast package chose the optimal specification based on the corrected Akaike Information Criterion (AICc).We used only non-seasonal ETS models with either additive or multiplicative error type, and either additive, multiplicative or no trend.Further, we allowed the trend to be either damped or non-damped.[22][23][24] The potential ARIMA model specifications were as follows: We allowed no seasonal model components or seasonal differencing.Non-seasonal first-order differencing was an option.We restricted the autoregressive order and the moving average order to 1, 2 or 3, each, with their sum not exceeding 4. The optimal model was chosen based on the AICc.[23,24] The AR NNet was of non-seasonal autoregressive order 6, included no seasonal autoregressive term, a hidden size of 18 and a weight decay of 2. We emphasized recent observations through exponentially rising observation weights.The final forecasts were the result of fitting five models with different initial model weights and averaging their individual results.[23,24]  We implemented the Theta model, a combination of simple exponential smoothing with a linear trend obtained through simple regression [26], using the forecast package, but manually overrode checks for seasonality to force unseasonal models [24].

Univariate neural network models
N-BEATS (short for Neural Basis Expansion Analysis for interpretable Time Series forecasting) [17] and N-HiTS (Neural Hierarchical Interpolation for Time Series forecasting) [18], two state-of-the-art univariate forecasting models based on multiple stacks of multi-layer perceptrons, were implemented using the Neural Forecast [27] python package.Given a set of potential hyperparameters per model, the package provides automatic selection and fitting of the best-performing model using a rolling window approach which continually validates model performances.
Per vital parameter, both models were fitted on the training set.Using the mean and standard deviation estimated from the training set, vital parameters were standardized to exhibit mean zero and standard deviation one prior to model fitting, and de-standardized post prediction.Model training lasted 100 epochs in batches of size 128.Per vital parameter and model, 25 hyperparameter configurations were tested using the HyperOpt [28] search algorithm.
For both models, hyperparameter choices were identical for the following three parameters: the learning rate (between 5�10 −5 and 5�10 −3 in steps of 10 −5 ), the number of hidden multilayer perceptron units (32, 64 or 128) and the number of past timepoints used as model input size (12, 24, 48 or 72).Additional hyperparameters were the polynomial degree for trend modeling in the N-BEATS model (choices 2, 3 or 4), and dropout (between 0.0 and 0.4 in steps of 0.01) in the N-HiTS model.

Multivariate neural network models
We implemented both the GRU and the Transformer using PyTorch [29].Models with different hyperparameter choices were fitted on the training set, and their performance was evaluated on the validation set.Vital parameter standardization was performed as described above for univariate neural network models.The models were designed to output forecasts for all 24 forecast horizons (5 through 120 minutes) simultaneously.
For the Transformer, we used the time-series architecture proposed by Zerveas et al (2021) [20].In addition to the hyperparameters used for training the GRU, it required hyperparameter choices for the activation function (rectified linear unit or Gaussian error linear unit), positional encoding (fixed or learnable), model dimension (64, 128, 256 or 512), and the number of transformer heads (8 or 16).
Hyperparameter optimization was implemented using the Ray package [30].We used the HyperOpt search algorithm [28] with trials set up and ended prematurely in case of poor performance after at least 10 epochs using the Hyperband algorithm [31].Models were trained for up to 100 epochs with training data presented in batches of 128 patients.50 different hyperparameter combinations were tried to find the most suitable model.

Statistical analysis
Forecasts were compared to the known ground truth as displayed in Fig 1 for all forecast horizons from 5 to 120 minutes into the future.For all six vital parameters and all forecast horizons, we computed the root mean squared error (RMSE) to evaluate the model accuracy.If the ground truth was unknown (and the vital parameter subsequently imputed to allow model fitting), the observation was not counted towards the RMSE.We used repeated bootstrapping on the patient level with 250 iterations to compute 95% confidence intervals.As additional error metrics reported in the appendix, we computed the mean absolute error (MAE) and the mean absolute percentage error (MAPE).
To summarize model performances in a single metric, we further computed the MAPE, a scale-less error metric, across all vital parameters and forecast horizons.95% confidence intervals for MAPE were again computed via bootstrapping.
A code repository containing the relevant software behind this project can be found at https://github.com/nhinrichsberlin/icu-vital-parameter-forecasting.

Study populations
We identified 24 831 patients in the internal dataset, out of which 19 865 were randomly allocated to the training set, 2 483 into the validation set, and 2 483 to the internal test set.The relevant cohort from the eICU database used for external testing consisted of 7 477 patients (Table 3).
Patients' ages were similar across datasets, with a median of 68 years.However, the prevalence of male sex was higher in the internal sets (69.1, 68.4,70.1% in training, validation, test set, respectively) compared with the eICU dataset (63.9%).For systolic, diastolic, and mean blood pressure, as well as peripheral oxygen saturation, there were no major discrepancies in the respective distribution between the populations.The external test set exhibited fewer patients with low central venous pressure (lower quartile 6.0 in the internal cohorts vs. 8.0 in the external test set), and, on average, lower heart rates (median 89.0 in the internal cohorts vs. 80.0 in the external test set).A detailed comparison of vital parameter distributions across cohorts is depicted in S1 Fig.
In the internal dataset, missing vital parameters were scarce, ranging from 1.8% for oxygen saturation to 3.1% for the heart rate.In the eICU cohort, the rate of missing values was considerably higher for systolic, diastolic, and mean blood pressure (~21% each) as well as central venous pressure (52.2%).Details are given in S1 Table.

Comparison of forecast quality
An example of a single forecast run of four models at a single timepoint is illustrated in Fig 2 .The mean absolute percentage error across all vital parameters and forecast horizons is displayed in Figs 3 (internal test set) and 4 (external test set).Root mean squared errors per vital parameter, forecast model and selected forecast horizons are displayed in Figs 5 (internal test set) and 6 (external test set).A complete summary of performances for all vital parameters and forecast horizons using three accuracy metrics (RMSE, MAE and MAPE) can be found in S2 Table .Optimal hyperparameters of the uni-and multivariate neural network models are listed in S3 (Transformer), S4 (GRU), S5 (N-BEATS) and S6 (N-HiTS) Tables.For all methods and vital parameters, forecast errors increase with increasing forecasting horizon (Figs 5 and 6).All four implemented neural network models display major improvements over conventional, univariate statistical models.The GRU, however, stands out as the best performing methodology implemented in this study.Its mean absolute percentage error across all vital parameters and forecast horizons is the lowest among all models, both in the internal (Fig 3) and the external test set (Fig 4).The Transformer exhibits the second-best performance both in the internal and external test sets.Thereby, multivariate neural network models claim the top spot among the model classes implemented in this study, followed by univariate neural networks, and lastly univariate statistical models.However, the gap between uni-and multivariate neural network models is not as pronounced as the gap between univariate neural networks and univariate statistical models.
The advantages of the neural network models become more pronounced with increased forecasting horizon (Figs 5 and 6).When forecasting only five minutes into the future, the relative improvement compared to simpler forecasting methods is slight.However, when forecasting 60 or 120 minutes into the future, these methods prove especially beneficial.Further, unlike other methods from the class of univariate statistical models, the four implemented neural network models consistently out-perform the naïve method.
Within the class of univariate neural networks, the difference between N-HiTS and N-BEATS is slight, although N-HiTS exhibits a fractionally smaller MAPE across all vital parameters and forecast horizons both for the internal and the external test sets.Among the univariate statistical models, the AR NNet displays the best forecast accuracy for forecasting horizons larger than 5.However, for forecasts 5 minutes ahead, its performance is sub-par.Like GRU, Transformer, N-HiTS and N-BEATS, AR NNet's performance relative to the naïve model consistently improves with increasing forecast horizon.
ARIMA performs second best among univariate statistical models, but the improvement upon the naïve model is mostly negligible.
ETS and Theta perform poorly.For large forecast horizons of 60 or 120 minutes, they are consistently the worst performing models, and drastically underperform even when compared to the naïve model (Figs 5 and 6).

Discussion
In this study, we applied forecasting methodologies of varying complexity to vital parameters tracked in the ICU following cardiothoracic surgery and compared the achieved forecast accuracies.Data originated from roughly 32,000 episodes of ICU care from a German heart center and the American multicenter eICU collaborative research database.With six vital parameters per patient, nine forecasting models, and forecast horizons ranging from 5 to 120 minutes, we established the most extensive set of benchmarks of forecast performance in the ICU to date.
We found modern neural network techniques to be far superior to conventional statistical models, with a widening performance gap for higher forecast horizons.While forecasting is a statistical learning problem where additional model complexity is oftentimes not rewarded with improved forecast accuracy [13,14], our findings support the initial hypothesis that the ICU is a setting primed for successful application of neural networks for forecasting: Despite high volatility and unpredictability on an individual patient level, there are common elements between patients and their vital parameter curves that can be learned and exploited.Firstly, certain dynamics, correlations, and limits of human physiology apply to all patients and can be approximated by a model.Secondly, treatment decisions by the medical staff adhere to rigorous guidelines, with the goal of achieving or maintaining vital parameter targets conductive to patient recovery.For example, for most patients lower end target values for mean blood pressure and oxygen saturation are 65 mmHg [32] and 92% [33], respectively.By training multivariate neural networks on close to 20,000 ICU admissions, our neural network models were given ample chance to detect, and at prediction time reflect these patterns.Our findings could therefore be seen as further evidence for the advantages of simultaneously learning from many independent, yet related examples (cross-learning) in time-series forecasting [34].
Their inability to perform cross-patient learning might also be at the core of the inability of some of the implemented univariate statistical models to produce better results than the naïve method.An over-reliance on recent trends could explain poor forecast performance, especially for longer forecast horizons, of the Theta method and ETS.Since these models are fitted independently for each patient, they have no way of judging whether certain trends and patterns ever occur in the ICU, where patients are under constant surveillance with the ability to intervene at short notice in case of unwanted developments.
Regarding the superiority of multivariate over univariate neural network models, we argue that modeling all vital parameters at once diminishes negative effects of missing values or artefacts not detected in the initial data cleaning step, so long as they do not apply to all vital parameters at the same time.Multivariate modeling further adds the potential to draw conclusions on temporal dynamics across vital parameters, e.g. if a worsening condition manifests itself in some vital parameters faster than in others.
Given the recent success of Transformer architectures in tasks requiring the processing of sequential data [20,35], our findings regarding the superiority of the GRU to the Transformer in ICU forecasting is somewhat surprising.While the Transformer is competitive and performs better than univariate models in ICU forecasting, it has not proven to be the state-ofthe-art in this specific application.Whether a different choice of Transformer architecture could remedy this shortcoming, or whether we have found the sweet spot of model complexity with the GRU remains speculative.
Regarding the data source used in this study for external validation, we see great potential for the eICU collaborative research database to become one of the benchmark datasets for multivariate time-series forecasting.Due to its large number of ICU stays, multi-center structure, and accessibility for research, evaluation on the eICU database is a formidable test of a model's ability to generalize well and robustly.

Limitations
Our work exhibits five key limitations, which should be addressed in future work.First, our forecasts were based solely on the vital parameters in question.However, for most patients in intensive care, the health status is captured through additional data sources which we left untapped in this study.Blood tests, medication, diagnoses, information on the preceding surgery or mechanical circulatory support might contain valuable information which could be exploited to further improve forecast accuracies.As currently constructed, the neural networks trained on thousands of ICU stays might have indirectly learned to price in interventions by the medical staff.Adding medication as external features might decrease the risk of models falsely predicting trends that are in truth the result of pharmacological intervention.We refrain from adding external features at this point because a) they are not needed to establish benchmarks or prove the superiority of modern neural network techniques in ICU forecasting, which we were able to demonstrate even without external features, b) their addition comes with considerable data processing work, especially in multicenter datasets that use non-standardized terms and units of measurement, and c) the process of distinguishing high-impact features that improve forecast performance from those that merely add complexity without additional useable information constitutes a separate research project.
Second, our models produced only point forecasts, whereas in practice prediction intervals might be required to indicate the uncertainty associated with each forecast.While obtaining prediction intervals from statistical models like ARIMA and exponential smoothing is straightforward under strong assumptions and implemented in frameworks like R's forecast package [24], it is a much more complex endeavor for recurrent neural networks or Transformers, and requires the implementation of Bayesian modeling, quantile regression, repeated sampling techniques or conformal predictions [36,37].Given the diverse approaches to this problem and the difficulty of evaluating intervals due to their inherent tradeoff between width and coverage rate [38], the derivation of prediction intervals is an ongoing research topic that deserves dedicated exploration.
Third, we developed and validated our models on a highly specific patient cohort, postoperative cardiothoracic surgery patients.Whether our findings translate to other ICU cohorts, and whether specific cohorts require specific forecast models remains to be tested.
Fourth, we evaluated our models strictly on quantitative performance in terms of deviation from the ground truth, regardless of medical context or interpretation of predicted changes.How specific medical scenarios influence forecasting, and in which cases it could be most beneficially employed is beyond the scope of this benchmark study.
Fifth and finally, this study does not address practical questions regarding optimal visualization and operationalization in real world ICU settings.On the path towards forecast-guided decision support in the ICU, details regarding the optimal frequency of forecasts, the most beneficial forecast horizon and the visual or acoustical highlighting of forecasts exceeding specified limits need to be customized to different medical domains and ICU workflows.

Conclusions
Generating accurate short-term forecasts of vital parameters for postoperative patients in intensive care is feasible, and displaying forecasts to the medical staff offers the potential to improve patient care through foresighted decision-making.Whilst therapeutic interventions will never be solely based on model-based forecasts, we see the potential for further development into a decision support tool for automated early warning to trigger a review by the patient's caregiver.
Regarding the choice of forecast models, modern multivariate neural networks achieve forecast accuracies superior to univariate time series models.Due to their demonstrated advantage in terms of accuracy, and their potential for further improvements through incorporating external parameters such as administered medication or laboratory results, they should be a focus for the further development of forecasting techniques in the ICU.

Fig 1 .
Fig 1.Schematic illustration of the forecasting setup.Synthetic example of three successive forecasts produced by the same method for one simulated vital parameter.Layout of the figure inspired by [11].https://doi.org/10.1371/journal.pdig.0000598.g001

Fig 2 .
Fig 2.An example of forecasts produced by four different models at a single timepoint for a single patient for all six vital parameters.Times are shifted for additional de-identification of the patient.The ground truth is displayed in black.https://doi.org/10.1371/journal.pdig.0000598.g002

Fig 3 .
Fig 3. Mean absolute percentage error (MAPE) across all vital parameters and forecast horizons in the internal test set.The best performing model (in terms of MAPE) is indicated by *.Error bars indicate the 95% confidence interval and were calculated from 250 bootstrap samples.https://doi.org/10.1371/journal.pdig.0000598.g003

Fig 4 .Fig 5 .Fig 6 .
Fig 4. Mean absolute percentage error (MAPE) across all vital parameters and forecast horizons in the external test set (eICU).The best performing model (in terms of MAPE) is indicated by *.Error bars indicate the 95% confidence interval and were calculated from 250 bootstrap samples.https://doi.org/10.1371/journal.pdig.0000598.g004 Pollard T, Johnson A, Raffa J, Celi L A, Badawi O, Mark R. eICU Collaborative Research Database [version 2.0].PhysioNet.2019.Available from: https://doi.org/10.13026/C2WM1R)can be obtained by researchers upon request and completion of the required training course.For more information, see https://eicu-crd.mit.edu/gettingstarted/access/.The software code developed as part of this study is available at https://github.com/nhinrichsberlin/icuvital-parameter-forecasting.This work was supported by the Berlin Institute of the Foundations of Learning and Data (BIFOLD).The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Pia Lanmueller declares relevant financial activities outside the submitted work with Abbott GmbH und Abiomed Inc. Felix Balzer reports funding from Medtronic and grants from the German Federal Ministry of Education and Research, grants from the German Federal Ministry of Health, grants from the Berlin Institute of Health, personal fees from Elsevier Publishing, grants from Hans Bo ¨ckler Foundation, other from Robert Koch Institute, grants from Einstein Foundation, and grants from Berlin University Alliance, all outside the submitted work.Carsten Eickhoff is co-founder of Codiag AG and x-cardiac GmbH, and reports consulting fees from Abacus Health and speaker fees from Merck, all outside the submitted work.

Table 1 . Upper/lower thresholds used for data cleaning.
Values exceeding these limits were treated as missing.